AITopics

2604.1956

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningMar-31-2026

A Comparative Investigation of Thermodynamic Structure-Informed Neural Networks

Li, Guojie, Hong, Liu

Physics-informed neural networks (PINNs) offer a unified framework for solving both forward and inverse problems of differential equations, yet their performance and physical consistency strongly depend on how governing laws are incorporated. In this work, we present a systematic comparison of different thermodynamic structure-informed neural networks by incorporating various thermodynamics formulations, including Newtonian, Lagrangian, and Hamiltonian mechanics for conservative systems, as well as the Onsager variational principle and extended irreversible thermodynamics for dissipative systems. Through comprehensive numerical experiments on representative ordinary and partial differential equations, we quantitatively evaluate the impact of these formulations on accuracy, physical consistency, noise robustness, and interpretability. The results show that Newtonian-residual-based PINNs can reconstruct system states but fail to reliably recover key physical and thermodynamic quantities, whereas structure-preserving formulation significantly enhances parameter identification, thermodynamic consistency, and robustness. These findings provide practical guidance for principled design of thermodynamics-consistency model, and lay the groundwork for integrating more general nonequilibrium thermodynamic structures into physics-informed machine learning.

artificial intelligence, deep learning, machine learning, (19 more...)

2603.26803

Country:

Asia > India (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-9-2026, 18:16:30 GMT

aa0d2a804a3510442f2fd40f2100b054-Paper.pdf

classifier, unlabeled data, variational principle, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningOct-28-2025

An Analytic Theory of Quantum Imaginary Time Evolution

Chen, Min, Zhang, Bingzhi, Zhuang, Quntao, Liu, Junyu

Quantum imaginary time evolution (QITE) algorithm is one of the most promising variational quantum algorithms (VQAs), bridging the current era of Noisy Intermediate-Scale Quantum devices and the future of fully fault-tolerant quantum computing. Although practical demonstrations of QITE and its potential advantages over the general VQA trained with vanilla gradient descent (GD) in certain tasks have been reported, a first-principle, theoretical understanding of QITE remains limited. Here, we aim to develop an analytic theory for the dynamics of QITE. First, we show that QITE can be interpreted as a form of a general VQA trained with Quantum Natural Gradient Descent (QNGD), where the inverse quantum Fisher information matrix serves as the learning-rate tensor. This equivalence is established not only at the level of gradient update rules, but also through the action principle: the variational principle can be directly connected to the geometric geodesic distance in the quantum Fisher information metric, up to an integration constant. Second, for wide quantum neural networks, we employ the quantum neural tangent kernel framework to construct an analytic model for QITE. We prove that QITE always converges faster than GD-based VQA, though this advantage is suppressed by the exponential growth of Hilbert space dimension. This helps explain certain experimental results in quantum computational chemistry. Our theory encompasses linear, quadratic, and more general loss functions. We validate the analytic results through numerical simulations. Our findings establish a theoretical foundation for QITE dynamics and provide analytic insights for the first-principle design of variational quantum algorithms.

artificial intelligence, machine learning, qite, (14 more...)

2510.22481

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Tabish, Mohammad, Leimkuhler, Benedict, Klus, Stefan

How deep is your network? Deep vs. shallow learning of transfer operators

arXiv.org Machine LearningSep-25-2025

We propose a randomized neural network approach called RaNNDy for learning transfer operators and their spectral decompositions from data. The weights of the hidden layers of the neural network are randomly selected and only the output layer is trained. The main advantage is that without a noticeable reduction in accuracy, this approach significantly reduces the training time and resources while avoiding common problems associated with deep learning such as sensitivity to hyperparameters and slow convergence. Additionally, the proposed framework allows us to compute a closed-form solution for the output layer which directly represents the eigenfunctions of the operator. Moreover, it is possible to estimate uncertainties associated with the computed spectral properties via ensemble learning. We present results for different dynamical operators, including Koopman and Perron-Frobenius operators, which have important applications in analyzing the behavior of complex dynamical systems, and the Schrödinger operator. The numerical examples, which highlight the strengths but also weaknesses of the proposed framework, include several stochastic dynamical systems, protein folding processes, and the quantum harmonic oscillator.

approximation, eigenfunction, operator, (17 more...)

2509.1993

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Denmark (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceSep-16-2025

HalluField: Detecting LLM Hallucinations via Field-Theoretic Modeling

Vu, Minh, Tran, Brian K., Shah, Syed A., Zollicoffer, Geigh, Hoang-Xuan, Nhat, Bhattarai, Manish

Large Language Models (LLMs) exhibit impressive reasoning and question-answering capabilities. However, they often produce inaccurate or unreliable content known as hallucinations. This unreliability significantly limits their deployment in high-stakes applications. Thus, there is a growing need for a general-purpose method to detect hallucinations in LLMs. In this work, we introduce HalluField, a novel field-theoretic approach for hallucination detection based on a parametrized variational principle and thermodynamics. Inspired by thermodynamics, HalluField models an LLM's response to a given query and temperature setting as a collection of discrete likelihood token paths, each associated with a corresponding energy and entropy. By analyzing how energy and entropy distributions vary across token paths under changes in temperature and likelihood, HalluField quantifies the semantic stability of a response. Hallucinations are then detected by identifying unstable or erratic behavior in this energy landscape. HalluField is computationally efficient and highly practical: it operates directly on the model's output logits without requiring fine-tuning or auxiliary neural networks. Notably, the method is grounded in a principled physical interpretation, drawing analogies to the first law of thermodynamics. Remarkably, by modeling LLM behavior through this physical lens, HalluField achieves state-of-the-art hallucination detection performance across models and datasets.

large language model, natural language, variation, (14 more...)

2509.10753

Country: North America > United States > Colorado (0.28)

Genre: Research Report (1.00)

Industry:

Energy (0.93)
Government > Regional Government (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Lee-Jenkins, Christopher R.

Manifold Trajectories in Next-Token Prediction: From Replicator Dynamics to Softmax Equilibrium

arXiv.org Artificial IntelligenceSep-1-2025

Decoding in large language models is often described as scoring tokens and normalizing with softmax. We give a minimal, self-contained account of this step as a constrained variational principle on the probability simplex. The discrete, normalization-respecting ascent is the classical multiplicative-weights (entropic mirror) update; its continuous-time limit is the replicator flow. From these ingredients we prove that, for a fixed context and temperature, the next-token distribution follows a smooth trajectory inside the simplex and converges to the softmax equilibrium. This formalizes the common ``manifold traversal'' intuition at the output-distribution level. The analysis yields precise, practice-facing consequences: temperature acts as an exact rescaling of time along the same trajectory, while top-k and nucleus sampling restrict the flow to a face with identical guarantees. We also outline a controlled account of path-dependent score adjustments and their connection to loop-like, hallucination-style behavior. We make no claims about training dynamics or internal representations; those are deferred to future work.

artificial intelligence, machine learning, trajectory, (17 more...)

2508.21186

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.83)

Neural Information Processing SystemsAug-15-2025, 16:38:35 GMT

A Variational Approach for Learning from Positive and Unlabeled Data Hui Chen

However, such a method easily leads to severe overfitting.

classifier, unlabeled data, variational principle, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceApr-15-2025

Dynamical symmetries in the fluctuation-driven regime: an application of Noether's theorem to noisy dynamical systems

Vastola, John J.

Department of Neurobiology, Harvard Medical School, Boston, MA, USA Editors: Simone Azeglio, Christian Shewmake, Bahareh Tolooshams, Sophia Sanborn, Chase van de Geijin, Nina Miolane Abstract Noether's theorem provides a powerful link between continuous symmetries and conserved quantities for systems governed by some variational principle. Perhaps unfortunately, most dynamical systems of interest in neuroscience and artificial intelligence cannot be described by any such principle. On the other hand, nonequilibrium physics provides a variational principle that describes how fairly generic noisy dynamical systems are most likely to transition between two states; in this work, we exploit this principle to apply Noether's theorem, and hence learn about how the continuous symmetries of dynamical systems constrain their most likely trajectories. We identify analogues of the conservation of energy, momentum, and angular momentum, and briefly discuss examples of each in the context of models of decision-making, recurrent neural networks, and diffusion generative models. Keywords: symmetry, invariance, Noether's theorem, stochastic processes, diffusion 1. Introduction In physics, Noether's theorem provides a fundamental link between the symmetries of physical systems on the one hand, and conserved quantities like energy and momentum on the other hand (Noether, 1918; Kosmann-Schwarzbach et al., 2011; Neuenschwander, 2017). In its modern form, it uniquely associates (equivalence classes of) independent continuous symmetries, which can be formalized in terms of Lie groups and algebras, with (equivalence classes of) independent conserved quantities (Martinez Alonso, 1979; Olver, 1986, 1993; Brown, 2020).

artificial intelligence, machine learning, noether, (18 more...)

2504.09761

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.24)
North America > United States > New York > New York County (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Sukumar, N., Acharya, Amit

Variational formulation based on duality to solve partial differential equations: Use of B-splines and machine learning approximants

arXiv.org Artificial IntelligenceDec-2-2024

Many partial differential equations (PDEs) such as Navier--Stokes equations in fluid mechanics, inelastic deformation in solids, and transient parabolic and hyperbolic equations do not have an exact, primal variational structure. Recently, a variational principle based on the dual (Lagrange multiplier) field was proposed. The essential idea in this approach is to treat the given PDE as constraints, and to invoke an arbitrarily chosen auxiliary potential with strong convexity properties to be optimized. This leads to requiring a convex dual functional to be minimized subject to Dirichlet boundary conditions on dual variables, with the guarantee that even PDEs that do not possess a variational structure in primal form can be solved via a variational principle. The vanishing of the first variation of the dual functional is, up to Dirichlet boundary conditions on dual fields, the weak form of the primal PDE problem with the dual-to-primal change of variables incorporated. We derive the dual weak form for the linear, one-dimensional, transient convection-diffusion equation. A Galerkin discretization is used to obtain the discrete equations, with the trial and test functions chosen as linear combination of either RePU activation functions (shallow neural network) or B-spline basis functions; the corresponding stiffness matrix is symmetric. For transient problems, a space-time Galerkin implementation is used with tensor-product B-splines as approximating functions. Numerical results are presented for the steady-state and transient convection-diffusion equation, and transient heat conduction. The proposed method delivers sound accuracy for ODEs and PDEs and rates of convergence are established in the $L^2$ norm and $H^1$ seminorm for the steady-state convection-diffusion problem.

artificial intelligence, deep learning, machine learning, (19 more...)

2412.01232

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Yolo County > Davis (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)